Simple Clustering

Evaluation Parameters

[1] "Evaluating clusters for Yakima Canyon"
[1] "Subset by Biological Year: TRUE"
[1] "Biological Year: 2022"
[1] "Subset by date range: FALSE"
[1] "From table: AnimalID_GPS_Data_AllCollars_2023_04_18"

GPS Clusters

First step is to evaluate any social organization of collared animals by looking at their GPS locations. Some herds have more obvious population substructure than others. This is a pretty basic analysis but might be helpful for some high-level grouping. However, it may not work really well for animals that show multi-modal type distribution patterns in their location data (i.e. multiple distinct activity centers).

To do this, we’ll read data from the data base, subset to our specified parameters (bio year, date range) and then compute the median location for each animal over the time period of interest.

Viewing our GPS locations

Computing Clusters

Next we’ll then compute a distance matrix based off median locations and apply a hierarchical clustering method using ‘hclust’ and the function ‘cutree’ to determine the number of clusters based off the height input to ‘cutree’. The ‘height’ parameter cuts the cluster dendrogram at a specific value rather than specifying a set number of clusters through the ‘k’ parameter. In this analysis, we could modify our value of ‘d’ below based on prior knowledge or what distance we want to consider a minimum for cluster membership. Here ‘d’ is set to 3x the median standard deviation in the location data when grouped by individual animal. Conversely we could use several tests to optimize the choice of k for each set of data and clustering agglomeration method. In exploratory analyses, the UPGMA or “average” method provided the best fit.

# perform clustering
  p.dist <- dist(xy)
  chc <- hclust(p.dist, method="average")
  
  xy.sp <- SpatialPointsDataFrame(matrix(c(xy$medX,xy$medY), ncol=2), 
                                  data.frame(AnimalID=rownames(xy)), proj4string=crs.projection)
  
  
# Distance threshold, larger value will yield fewer clusters
#   6-7k chosen here, it's ~ axis of typical Lookout Mountain home range
  #d <- 6000
  d <- 3 * median(sqrt(df.m$sdX^2+df.m$sdY^2))
  chc.d5k <- cutree(chc, h=d)
  nclust <- max(chc.d5k) 
  
# Join results to display sp points
  xy.sp@data <- data.frame(xy.sp@data, Clust=chc.d5k)
 
# Cluster membership, ordered
  rownames(xy.sp@data) <- NULL
  clusters <- sort(unique(xy.sp@data$Clust))
  members <- c(rep("",length(clusters)))
  for (i in 1:length(clusters)){
    membs <- xy.sp@data$AnimalID[xy.sp@data$Clust==clusters[i]]
    members[i] <- str_flatten(membs,collaps=", ")
  }
  kable(data.frame(Cluster=clusters,Members=members),align='ll')
Cluster Members
1 23BS5651, 23BS5660, 23BS5670, 23BS5677, 23BS5707, 23BS5725
2 23BS5652, 23BS5654, 23BS5657, 23BS5669, 23BS5671, 23BS5672, 23BS5674, 23BS5678, 23BS5679, 23BS5680, 23BS5681, 23BS5682, 23BS5702, 23BS5709, 23BS5715, 23BS5722, 23BS5726
3 23BS5655, 23BS5665
4 23BS5658, 23BS5668, 23BS5684, 23BS5700, 23BS5723, 23BS5724

Plotting Raw Cluster Dendrograms

Plotting Cut Cluster Dendrograms

Viewing the Clusters on a Map

Evaluate our results visually:

Home Range Overlap

Computing home ranges

Here we will compute home ranges for every GPS-collared animal in the herd using the ‘adehabitatHR’ package and either a bivariate normal or brownian bridge kernel function. Then, the amount of overlap between each animal (area or UD) is calculated and stored in a matrix. We’ve set the kernel function, minimum fixes, and contour level used to compute the home range from the utilization distribution in the user-input section in the head of this Markdown .Rmd

if (HRestimator=="BB"){
homeranges <- calculateBBHomerange(gps.sf,min.fixes=min.fixes,contour.percent=contour.percent, output.proj=projection)
} else homeranges <- calculateHomerange(gps.sf,min.fixes=min.fixes,contour.percent=contour.percent, output.proj=projection)
[1] "Kernel function: Brownian bridge"
[1] "Total rows in GPS table for HR calculation:  4436"
[1] "Date range for HR calculation:  2023-01-18 22:00:37  to  2023-04-18 16:01:07"
[1] "Contour level is set to:  75 %"

Map of home ranges

Evaluating amount of overlap between animals

A plot showing the amount of overlap between each pair of animals. This gets pretty messy with large numbers of individuals, so it probably makes more sense to explore the relationships between animals using this measure in a clustering algorithm.

Home range overlap

Clustering from Overlap

Using overlap as a measure of connectivity between animals

In the last tab, we computed a matrix that contained the fraction of each animals home range (by row in the matrix) contained in every other animal in the herd (by columns). Now, treating this as a weighted adjacency matrix we can use tools from the ‘igraph’ network analysis package to map clusters viewing this data as a directed social network, with the connection between animals weighted by the amount of home range overlap. Note that in a directed network,the connection A to B can be different than B to A, which matches our data. Here we are showing the adjacency matrix clustered using a hierarchical walktrap method and displayed in two plots: 1) plot of the network and 2) a dendrogram. Note that the group colors in the network plot match the leaf text color in the dendrogram.

Network plot

Dendrogram

Display cluster membership

Cluster Members
1 23BS5655, 23BS5665
2 23BS5658, 23BS5668, 23BS5684, 23BS5700, 23BS5723, 23BS5724
3 23BS5660, 23BS5677
4 23BS5651, 23BS5670, 23BS5707, 23BS5725
5 23BS5652, 23BS5654, 23BS5671, 23BS5678, 23BS5702, 23BS5709, 23BS5722
6 23BS5657, 23BS5669, 23BS5672, 23BS5674, 23BS5679, 23BS5680, 23BS5681, 23BS5682, 23BS5715, 23BS5726

Linking Testing Data to Networks

Combining testing results with the social network

Now that we have mapped out existing social groups based on overlap in individual home ranges, we can add the results of testing for Movi to our display. By linking each animal back to the testing results records in our database. Note these plots are interactive.

Network plot with ELISA Status

Network plot with PCR Status